Search results for "circular word"

showing 9 items of 9 documents

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)
researchProduct

Marked systems and circular splicing

2007

Splicing systems are generative devices of formal languages, introduced by Head in 1987 to model biological phenomena on linear and circular DNA molecules. In this paper we introduce a special class of finite circular splicing systems named marked systems. We prove that a marked system S generates a regular circular language if and only if S satisfies a special (decidable) property. As a consequence, we show that we can decide whether a regular circular language is generated by a marked system and we characterize the structure of these regular circular languages.

Discrete mathematicsProperty (programming)Structure (category theory)Molecular computingCircular wordDecidabilityRegular languageIf and only ifRNA splicingFormal languageSplicing systemFormal languageGenerative grammarAutomata theoryMathematics
researchProduct

Suffixes, Conjugates and Lyndon Words

2013

In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by $\plex$ and $\pom$, that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the $\pom$ order between two primitive wo…

MultisetReduction (recursion theory)BWT; Lyndon factorization; Suffix ArrayString (computer science)Suffix arrayLyndon words Lyndon factorization BWT Suffix array EBWT Circular words ConjugacyLexicographical orderlaw.inventionSuffix ArrayCombinatoricsBWTLyndon factorizationlawOrder (group theory)Symbol (formal)Word (group theory)Mathematics
researchProduct

A characterization of regular circular languages generated by marked splicing systems

2009

AbstractSplicing systems are generative devices of formal languages, introduced by Head in 1987 to model biological phenomena on linear and circular DNA molecules. A splicing system is defined by giving an initial set I and a set R of rules. Some unanswered questions are related to the computational power of circular splicing systems. In particular, a still open question is to find a characterization of circular languages generated by finite circular splicing systems (i.e., circular splicing systems with both I and R finite sets). In this paper we introduce a special class of the latter systems named marked systems. We prove that a marked system S generates a regular circular language if an…

Pure mathematicsGeneral Computer ScienceMolecular computing Splicing systems Circular words Formal languages Automata theoryMolecular computingQuantitative Biology::GenomicsDecidabilityTheoretical Computer ScienceSet (abstract data type)Formal languagesRegular languageFormal languageRNA splicingAutomata theorySplicing systemsCircular wordsFinite setAlgorithmWord (computer architecture)Automata theoryMathematicsComputer Science(all)
researchProduct

On the regularity of circular splicing languages : A survey and new developments

2009

Circular splicing has been introduced to model a specific recombinant behaviour of circular DNA, continuing the investigation initiated with linear splicing. In this paper we focus on the relationship between regular circular languages and languages generated by finite circular splicing systems. We survey the known results towards a characterization of the intersection between these two classes and provide new contributions on the open problem of finding this characterization. First, we exhibit a non-regular circular language generated by a circular simple system thus disproving a known result in this area. Then we give new results related to a restrictive class of circular splicing systems…

Discrete mathematicsComputer scienceOpen problemINF/01 - INFORMATICAGraph theoryCircular wordMolecular computingComputer Science ApplicationsGraph theoryAutomata theory Circular words Formal languages Graph theory Molecular computing Splicing systemsIntersectionFormal languageTheory of computationGraph (abstract data type)CographFormal languageSplicing systemComplement (set theory)Automata theory
researchProduct

SORTING CONJUGATES AND SUFFIXES OF WORDS IN A MULTISET

2014

In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by <lex and ≺ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the <lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatic…

Lyndon words; Burrows-Wheeler transform; Extended Burrows-Wheeler transform; Circular words; Conjugates; Suffixes; SortingSuffixesMultisetTheoretical computer sciencePoint (typography)Burrows–Wheeler transformSettore INF/01 - InformaticaSortingcircular wordExtension (predicate logic)Lyndon wordsBurrows-Wheeler transformLyndon wordField (computer science)ConjugatesconjugateComputer Science (miscellaneous)sortOrder (group theory)suffixeArithmeticextended Burrows-Wheeler transformCircular wordssortingMathematics
researchProduct

Linear-time sequence comparison using minimal absent words & applications

2016

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in…

0301 basic medicineLatin AmericansComputer Science (all)Library science0102 computer and information sciencesCircular wordAlgorithms on string01 natural sciencesAlignmentfree comparisonSequence comparisonTheoretical Computer Science03 medical and health sciences030104 developmental biology010201 computation theory & mathematicsInformaticsPolitical scienceAbsent wordForbidden word
researchProduct

Minimal forbidden factors of circular words

2017

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language $M$, computes a DFA recognizing the language whose set of minimal forbidden factors is $M$. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word.…

FOS: Computer and information sciencesSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniGeneral Computer ScienceDiscrete Mathematics (cs.DM)Finite automatonSettore INF/01 - InformaticaFormal Languages and Automata Theory (cs.FL)Factor automatonComputer Science - Formal Languages and Automata TheoryComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Circular wordFibonacci wordMinimal forbidden factorTheoretical Computer ScienceComputer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Minimal Forbidden Factors of Circular Words

2017

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language M, computes a DFA of the language having M as set of minimal forbidden factors. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word. We gener…

L-automatonDiscrete mathematicsFactorialFibonacci numberSettore INF/01 - InformaticaComputer Science (all)Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)0102 computer and information sciences02 engineering and technologyCircular wordMinimal forbidden factor01 natural sciencesTheoretical Computer ScienceSet (abstract data type)010201 computation theory & mathematicsIf and only ifTrie0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingComputer Science::Formal Languages and Automata TheoryWord (computer architecture)Mathematics
researchProduct